Search CORE

3 research outputs found

Document analysis at DFKI. - Part 1: Image analysis and text recognition

Author: Ali Majdi Ben Hadj
Fein Frank
Hönes Frank
Jäger Thorsten
Weigel Achim
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1995
Field of study

Document analysis is responsible for an essential progress in office automation. This paper is part of an overview about the combined research efforts in document analysis at the DFKI. Common to all document analysis projects is the global goal of providing a high level electronic representation of documents in terms of iconic, structural, textual, and semantic information. These symbolic document descriptions enable an "intelligent\u27; access to a document database. Currently there are three ongoing document analysis projects at DFKI: INCA, OMEGA, and PASCAL2000/PASCAL+. Though the projects pursue different goals in different application domains, they all share the same problems which have to be resolved with similar techniques. For that reason the activities in these projects are bundled to avoid redundant work. At DFKI we have divided the problem of document analysis into two main tasks, text recognition and text analysis, which themselves are divided into a set of subtasks. In a series of three research reports the work of the document analysis and office automation department at DFKI is presented. The first report discusses the problem of text recognition, the second that of text analysis. In a third report we describe our concept for a specialized document analysis knowledge representation language. The report in hand describes the activities dealing with the text recognition task. Text recognition covers the phase starting with capturing a document image up to identifying the written words. This comprises the following subtasks: preprocessing the pictorial information, segmenting into blocks, lines, words, and characters, classifying characters, and identifying the input words. For each subtask several competing solution algorithms, called specialists or knowledge sources, may exist. To efficiently control and organize these specialists an intelligent situation-based planning component is necessary, which is also described in this report. It should be mentioned that the planning component is also responsible to control the overall document analysis system instead of the text recognition phase onl

Universaar

Acronym

Anforderungen an ein System zur Dokumentanalyse im Unternehmenskontext : Integration von Datenbeständen, Aufbau- und Ablauforganisation

Author: Baumann Stephan
Hadj Ali Majdi Ben
Lichter Jürgen
Malburg Michael
Meyer auf\u27m Hofe Harald
Wenzel Claudia
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1997
Field of study

Workflowmanagementsysteme werden im Bürobereich verstärkt zur effizienten Geschäftsprozeßabwicklung eingesetzt. Das bereits Mitte der 70er Jahre propagierte papierlose Büro bleibt jedoch gegenwärtig immer noch Utopie, da auch durch den allgegenwärtigen Einsatz von Computern im Bürobereich der Durchsatz an Schriftstücken nicht gesenkt wird. Insbesondere die Handhabung von papierintensiven Vorgängen ist in hohem Maße abhängig von einer Identifikation und Aufbereitung der in den Dokumenten enthaltenen Informationen. Allerdings müssen solche Daten z. B. bei eingehender Post immer noch von Hand eingegeben werden. In diesem Dokument werden Anforderungen an ein System aufgestellt, das diesen Medienbruch überwinden solI. Techniken aus dem Gebiet der Dokumentanalyse und des Dokumentverstehens werden in den Workflowkontext integriert und nutzen das dort verfügbare Wissen zur Steigerung der Erkennungsqualität. Durch Einschränkung des aktuellen Kontextes - etwa in Form offener Vorgänge - soll eine Erhöhung der Erkennungspräzision erreicht werden. Bei der Beschreibung der Systemanforderungen wurde nach den Richtlinien des V-Modells vorgegangen

Universaar

Acronym

Message Extraction from Printed Documents - A Complete Solution -

Author: Achim Weigel
Andreas Dengel
Claudia Wenzel
Hadj Ali
Majdi Ben
Majdi Ben Hadj Ali
Michael Malburg
Stephan Baumann
Thorsten Jäger
Publication venue
Publication date
Field of study

The task to be solved within our core research was the design and development of a document analysis toolbox covering typical document analysis tasks such as document understanding, information extraction and text recognition. In order to prove the feasibility of our concepts, we have developed the prototypical analysis system OfficeMAID. The system analyzes documents, as used in the daily work of a purchasing department, by a-priori knowledge about workflows and document features. In this way the system provides goal-directed information extraction, shallow understanding and process identification for given documents (paper, fax, e-mail). This work has been supported by a grant from the BMBF (ITW 9702). 1 Introduction Generally, printed documents are neither generated for scanning and automatic processing nor for easy integration into electronic workflows. Therefore, it is hard to transform them adequately for further processing by electronic means. This is the reason why DMS --- in..

CiteSeerX